† Corresponding author. E-mail:
Project supported by the National Natural Science Foundation of China (Grant Nos. 11774011, 11434001, U1530401, and U1430237).
The CryoEM single particle structure determination method has recently received broad attention in the field of structural biology. The structures can be resolved to near-atomic resolutions after model reconstructions from a large number of CryoEM images measuring molecules in different orientations. However, the determining factors for reconstructed map resolution need to be further explored. Here, we provide a theoretical framework in conjunction with numerical simulations to gauge the influence of several key factors to CryoEM map resolutions. If the projection image quality allows orientation assignment, then the number of measured projection images and the quality of each measurement (quantified using average signal-to-noise ratio) can be combined to a single factor, which is dominant to the resolution of reconstructed maps. Furthermore, the intrinsic thermal motion of molecules has significant effects on the resolution. These effects can be quantitatively summarized with an analytical formula that provides a theoretical guideline on structure resolutions for given experimental measurements.
The Cryo-electron microscopy (CryoEM) single particle imaging method has become popular recently in the structural biology research community.[1] The basic idea of Cryo-EM is to measure the particles at all possible orientations and use computational model reconstruction algorithms to build a three-dimensional (3D) structure that best satisfies the overall measurements. Due to irradiation damage from high-energy electrons during the measurements, each particle or molecule can only tolerate certain amount of electron doses before the molecule deteriorates. Experimentally, each particle/molecule is only measured once at a given orientation that is nearly fixed in vitreous ice. The CryoEM single particle imaging method spreads the electron dosage to a large ensemble of molecules, and each scatters a tolerable number of electrons to form a magnified image. Meanwhile, the cryogenic environment protects the sample molecules, maintaining molecular integrity. Nonetheless, the model resolutions from CryoEM experiments were not close to those obtained from the x-ray crystallography method until the recent breakthrough in three aspects, namely: (i) the invention of a direct electron detector to allow accurate and fast measurement of electrons;[2] (ii) the development of data processing software, in particular the application of Bayesian algorithms in reconstructions, backed by high-performance computers;[3–5] and (iii) the advances in sample preparations that allow measurements at diverse orientations at improved signal levels using thin vitreous ice layers.[6–8] The fast readout rate of new direct electron detectors also enables measurements in movie modes that lead to the correction of molecular drift during data collection to sharpen the blurred signals.[9,10] For a long time, only the particles with high symmetries could be determined to high resolutions using CryoEM single particle imaging method, such as virus particles.[11–13] However, since the structural determination of the TRPV1 molecule at 3.4 Å,[14] many high-resolution structures of molecular complexes have been determined using the CryoEM single particle imaging method. This technology is enriching in the protein structure database, particularly with large molecular complexes.[15–21]
Despite the advances in CryoEM single particle imaging method, some fundamental questions remain. One question we would like to address here is regarding the determinant factors for model resolution. In the crystallography method, the concept and measures of resolution have been well established,[22,23] while they are still under investigation in CryoEM. In general, the resolutions for the maps determined using the CryoEM approach are estimated using model consistency; i.e., by calculating the Fourier shell correlation (FSC) profiles and examining the point where the signal disappears.[24–27] Recently, several alternative methods have been developed to assess the map resolutions, such as the approach that checks the local details of the structure.[28] For maps determined to near atomic levels (better than 3 Å), the reconstructed electron density maps can be visually inspected to check the model accuracy. Regardless of the different definitions in model resolution, the correct interpretation of the reconstructed models is subject to validation using complementary approaches, such as biochemical assays or single molecule fluorescence experiments. Putting aside the arguments on CryoEM map resolutions using different approaches, we would like to focus on the factors that determine the model resolution and hope to obtain a theoretical framework that guides the experiments to improve the resolution using optimized protocols for data collection.
In this work, we investigated four factors that influence model resolutions, namely, the number of projection measurements, the signal-to-noise ratio (SNR) of individual measurements, and the intrinsic flexibility of molecules. Early studies have provided important clues about how these factors may contribute to the model resolutions. For example, Henderson studied the resolution limits resulted from electron microscopy and provided a relation between the resolution and the number of projections.[29] Later, Rosenthal and Henderson formulated a more detailed equation (noted RH model hereafter) to estimate the desired number of projections for the different structure resolutions.[26] In the RH model, the electron dose, SNR, molecular symmetry, and an effective B-factor were considered.[26,30,31] The effective B-factor was found to be essential to fit to experimental data because it is used to model the combined effects of molecular drifting due to charging effects, molecular flexibility, errors in image processing, and so on, into a Gaussian envelope function that describes the signal falloff.[30,31] Based on this pioneer research, we would like to revisit these relations and validate the formulations using numerical simulations. Furthermore, it is known that many molecules undergo conformational changes to be functional. To solve structures at higher resolutions, the molecules can be locked in a particular conformation. For example, Subramaniam and coworkers used a cell-permanent inhibitor to stabilize β-galactosidase and obtained a CryoEM structure at 2.2 Å.[32] In another work, the same group obtained a structure of glutamate dehydrogenase to 1.8 Å after detailed projection classifications by sorting out the images that belong to the most populated conformation.[33] Here, we set out to investigate the effects of molecular intrinsic motion using a structure ensemble to simulate CryoEM single particle images, taking the heterogeneous conformation reality under consideration. Consequently, we proposed a framework using these factors to predict structure the resolutions. The numerical simulation results were used to estimate free parameters. The statistics from the resolved structures are consistent with the proposed model.
The model originally proposed by Rosenthal and Henderson (the RH model) and later elaborated by Liao and Frank connects several key factors in the CryoEM method using the following equation:[26]
The SNR (defined as the ratio between variances of signals and noises) at a resolution shell k is effectively represented by
The structure of GroEL (PDB ID: 1XCK,[34]) was used as the model system in the numerical simulations (see Fig.
The Gaussian noise (GN) model was proposed to describe the dependency of the model resolution on the number of projections and the noise variance.
The TF model describes how thermal fluctuations of the molecules influence the model resolution. RMSD (root-mean-square-deviation) is one quantity that measures the difference between the structures, and here we used the mean square of the pairwise RMSD of an structure ensemble to quantify the structure fluctuations and to mimic the Debye–Waller factor. The following formula is proposed to relate the thermal fluctuation levels, number of projection images and achievable map resolution:
For the Gaussian noise model, SPIDER 22.03[35] packages were used to generate the simulation data. The atomic model of GroEL was first converted to a density map (voxel size = (0.86 Å)3), and then projection images were simulated at orientations generated with the successive orthogonal rotation sample approach.[36] The noises were incorporated according to the desired SNR following a Gaussian distribution after the contrast transfer function (CTF) for defocus of range from 1.0 μm to 3.0 μm was convoluted to the simulated projections. In this case, the CTF was not modulated by the envelope function. The image simulation process is summarized in Fig.
To simulate the heterogeneity in the structures, we first obtained a set of diverse structures to form a structure ensemble based on the GroEL structure. Without the loss of generality, the structural ensemble was generated using the normal mode perturbation approach. The ‘ProDy’ program based on an anisotropic elastic network model was used to compute the normal modes and the eigenvalue spectrum.[37] Since the functional relevant motions are highly collective, three normal modes corresponding to the lowest frequencies were used to generate the perturbed structures. Specifically, the original structure (gray colored in Fig.
The projection image simulation procedure is essentially the same as previously described for the case of a single 3D structure, except that images forming a dataset were simulated based on randomly selected 3D structures from the corresponding group. Consequently, each simulated dataset has the following controlled parameters: the number of projections, SNR, and the structure heterogeneity due to thermal fluctuations.
The resolution determination was based on the Fourier shell correlation (FSC) implemented in SPIDER with the gold standard rule at the cutoff level of FSC = 0.143. The simulation data was split into two half subsets randomly, and each reconstruction was carried out using back projections with known orientations or using the iterative reconstruction methods based on Bayesian maximum likelihood algorithm implemented in the Relion 1.4 package.[4] In the cases that the standard reconstruction procedures were carried out to build the electron density maps, each dataset was processed five times and the best resolutions were used in the final analysis.
The resolutions of the reconstructed maps were determined for various combinations of factors that were considered in this work. The value of each factor was systematically scanned in practical ranges so that the quantitative relationships could be studied using a parameter fitting to theoretical formula. The free parameters used in the GN and TF models were obtained by the nonlinear Least Squares (Curve Fitting) module in MATLAB.
A statistical survey was carried out on the structures determined using CryoEM deposited in the EMDB database (
We focused on the models that fulfill the following requirements: (I) Models that were deposited between 2017/01/01 and 2018/05/17; (II) Micrographs were recorded using direct electron detection technology; (III) The reported resolution is determined with gold standard rule at FSC = 0.143; (IV) Models without higher symmetry; (V) Molecule weight of all models are ranged from 0.5 MDa to 1.0 MDa. As a result, 86 models from EMDB were selected to for the map resolution statistics.
The RH model describes the dependency of the resolution on the number of projections and the SNR, molecular symmetry, and other factors under the umbrella of the b-factor. We focused on the study of three parameters by simplifying the formula to the GN and TF models, as described in Section
The RH model describes the dependency of map resolution on the number of experimental images, and the logarithm trend of the reconstructed model resolution as a function of the number of measured projections is attributed to the b-factors (due to sample particle drifting, misalignment, numerical interpolation, etc.[26] Surprisingly, the logarithm trend was observed for the simulation data without explicitly applying the envelope function eBk2/2 during the projection simulations (see Fig.
In the 2D cases, n measurements of the same image will boost the SNR (defined as ratio of signal variance and noise variance) n times if the noise types and levels are the same for all measurements.[30] In the 3D map reconstruction from 2D projection images, we observed the same relationship (see Fig.
In Fig.
The map reconstructions using back projection method are only possible if the image orientations were known, so it describes an ideal situation. In practice, the orientations needs to be assigned using iterative methods, which converges to the model that best satisfies the constraints of the whole dataset. Here, the auto-refine function implemented in Relion was applied for the model reconstruction. The gold standard rule was also used for resolution determination. To reduce the influence from randomness, the best resolution from five independent runs were reported as the map resolution for final statistics. The results are summarized in Figs.
Biological macromolecules exist in a thermal environment, and the structure fluctuates around the native states. In many cases, due to the functionality, molecules exist in several meta-stable conformations.[38–40] Using the normal mode perturbation approach, we attempted to simulate intrinsic motions and study their influence on the achievable structure resolutions at various conformation heterogeneity levels. The TF model described by Eq. (
In practice, both experimental noise and molecular thermal fluctuation have impacts on the CryoEM single particle experimental data. Therefore, it is necessary to derive a model that combines the GN and TF models. Intuitively, the following formulation is devised by treating the Gaussian noise and thermal fluctuation as independent factors that affect the structure resolutions:
To compare our theory and simulation results with experimental data and cross-validate the conclusions, we conducted a systematic survey on the structures determined using CryoEM single particle imaging technology, which are deposited in the EMDB database. It is worth noting that only the subset of structures that met the criteria described in the Methods section were used for the statistics. The resolution distribution nicely resembles the relationship described using the noise model and the thermal fluctuation model. Using these parameters (AGN, ATF, B, C in Eqs. (
As shown in Fig.
Based on theoretical frameworks, we systematically investigated the map resolutions and factors using numerical simulation methods, including the number of measured projections, signal-to-noise ratio, and the heterogeneity of the molecules. Two theoretical frameworks were proposed to describe the relationship between these factors and the resolutions of reconstructed maps. The Gaussian noise model is essentially the same as the formula proposed by Rosenthal and Henderson, and we found that the noise term could be combined with the number of projections by defining an ‘effective number of projections’. In the thermal fluctuation model, the intrinsic dynamic characters of the sample were considered, and the final resolution could be affected by the fluctuation level of molecules. The noise and thermal fluctuation frameworks can be used to provide guidelines to estimate the required number of projections to reach the desired resolutions, which was validated using the statistics from the structures that were experimentally resolved using CryoEM single particle imaging method.
Simulation studies were carried out in a controlled manner so that the influence of individual factor can be decoupled from that of other factors. We noticed that the uniform orientation sampling is an ideal situation, because orientation bias often exists in experimental datasets. In extreme cases, the missing cone problem can result in strong artifacts in the reconstructed models. Therefore, the FSC criteria measured the model consistency and not necessarily the correctness. Because of the scope of this study, we used the uniform distribution of orientations and the back-projection method to ensure that the model reconstructions were carried out properly. These operations are useful to secure the validity of the FSC criteria in the resolution cutoff. Nonetheless, the correctness of the model should be checked using complementary methods, such as visual inspection of the density maps, local resolution estimation, or validation using biochemistry assays.
In this work, the noises were simulated from a Gaussian distribution to study their influence on the model resolutions. This largely simplifies the noise sources, where the major sources include background scattering from vitreous ice, sample drifting during measurement, misalignment in the orientations and errors introduced in the orientation discretization. Some of these errors could be mimicked in the simulation framework, such as using an envelope function with b-factors. However, this is beyond our focus in this study, although it may be a subject for future research.
Despite the simplicity of the formulation, the proposed models can be useful in structure determination with CryoEM single particle imaging methods. One application is to design a data collection strategy to reach the desired resolutions. SNR can be estimated based on sample screening data, and then equations (
The thermal fluctuation model can be used to assess the structure heterogeneity by reconstructing structures with subsets of data under the homogenous conformation assumption. With the obtained structure, the same simulation study can be carried out to generate a series of curves that are associated with different structural heterogeneity (see Fig.
It should be noted that the conclusion learned from the numerical simulations is conditional thanks to the simplified the theoretical framework and the idealized treatment of simulation data. The effective number of projections, Ne, is valid under the assumption that the SNR is high enough to allow most of projections to be assigned to their correct orientations. Otherwise, the Ne will not be a simple product of SNR and the number of projections. One would expect a much smaller Ne than SNR*Np, if the low SNR leads to significant alignment errors. Including more projections images should improve the model resolution in practice, so it is not suggested to exclude a particular set of images that are at high defocus levels, because they can help improve the orientation assignment, especially when dataset is not extremely large.
In summary, the resolution limiting factors in the CryoEM single particle imaging method were investigated under theoretical frameworks with a numerical simulation approach. The results suggest that the resolution of the reconstructed structure strongly depends on the number of measurements, image quality, and molecular flexibility. Our results can provide a guidance to design appropriate experimental strategies in data collection in general and model reconstruction in the case of molecules with heterogeneous conformations.
[1] | |
[2] | |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] | |
[12] | |
[13] | |
[14] | |
[15] | |
[16] | |
[17] | |
[18] | |
[19] | |
[20] | |
[21] | |
[22] | |
[23] | |
[24] | |
[25] | |
[26] | |
[27] | |
[28] | |
[29] | |
[30] | |
[31] | |
[32] | |
[33] | |
[34] | |
[35] | |
[36] | |
[37] | |
[38] | |
[39] | |
[40] | |
[41] | |
[42] | |
[43] |